Skip to content

Make trait TMerkleTreeNode dyn compatible (fka object safe)#406

Open
malcolmgreaves wants to merge 8 commits intomainfrom
mg/1_TMerkleTreeNode_object_safety
Open

Make trait TMerkleTreeNode dyn compatible (fka object safe)#406
malcolmgreaves wants to merge 8 commits intomainfrom
mg/1_TMerkleTreeNode_object_safety

Conversation

@malcolmgreaves
Copy link
Copy Markdown
Collaborator

First in a series of PRs to abstract the Merkle tree store so that we can provide different
backing implementations other than the existing custom file format based implementation.

There's no change in external behavior in this PR: it is a refactor.

This PR changes drops the + Serialize constraint to trait TMerkleTreeNode to make it
dyn Compatible (this was formally known as object safe). Serialize has a method
serialize<S: Serializer> that prevents it from being compiled as a virtual table lookup,
hence it's not dyn Compatible. [1]

The change here drops the + Serialize constraint from the trait and instead adds a new
method to_msgpack_bytes() -> Result<Vec<u8>, ...>, which performs the serialization.
The trait also introduces a new SerializationError associated type, which is the error type
returned by to_msgpack_bytes(). There's a blanket implementation that implements this
updated trait for all existing concrete node implementations: it uses their Serialize impl.
to provide a generic implementation for to_msgpack_bytes with a shared serialization
error of rmp_serde::decode::Error.

Sites that were manually calling serialize on the node now call to_msgpack_bytes().
There's a new conversion of decode::Error into an OxenError wrapper.

Functions using nodes that used the TMerkleTreeNode trait have been updated with
new type constraints on the generic type that implements the trait. Specifically, there's
constraints to ensure that the SerializationError can be converted into an OxenError
when appropriate. And p_write_tree has been updated to ensure that the FileNode,
DirNode, and VNode concrete implementations all have TMerkleTreeNode impls
that have the same SerializationError. This works because all use decode::Error.

[1] https://doc.rust-lang.org/reference/items/traits.html#r-items.traits.dyn-compatible

- modified trait to use associated type for error
- to_msgpack_bytes returns result w/ that error or vec<u8>
- blanket impl uses something that also has serialize as to_msgpack_bytes implmentation
- made `from_u8` return error & added it to the OxenError heirarchy

WIP propigate changes
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2026

Note

Currently processing new changes in this PR. This may take a few minutes, please wait...

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1e3e88d7-ac2e-487d-9896-76f4d9ab3dd4

📥 Commits

Reviewing files that changed from the base of the PR and between 72fc293 and 4b3d9a4.

📒 Files selected for processing (1)
  • crates/lib/src/model/merkle_tree/node_type.rs
 ______________________________________________________________________________________________________________________________________________________________________
< Refactor early, refactor often. Just as you might weed and rearrange a garden, rewrite, rework, and re-architect code when it needs it. Fix the root of the problem. >
 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
  \
   \   \
        \ /\
        ( )
      .( o ).
📝 Walkthrough

Walkthrough

Refactors Merkle node serialization and error handling: introduces fallible node-type parsing and a SerializationError associated type on TMerkleTreeNode, replaces panics with Result-based MsgPack encode/decode propagation, removes empty per-node TMerkleTreeNode impls in favor of a blanket impl, and updates callers and DB/repo code to use generics and error conversions.

Changes

Cohort / File(s) Summary
Error enum
crates/lib/src/error.rs
Added OxenError::MerkleTreeError(InvalidMerkleTreeNodeType) and OxenError::RmpEncodeError(rmp_serde::encode::Error).
Node type & trait refactor
crates/lib/src/model/merkle_tree/node_type.rs
Added InvalidMerkleTreeNodeType; made from_u8 fallible and from_u8_unwrap for legacy panic behavior; redesigned TMerkleTreeNode with SerializationError associated type and to_msgpack_bytes(); added blanket impl for Serialize types.
Node deserialization signatures
crates/lib/src/model/merkle_tree/node/...
crates/lib/src/model/merkle_tree/node/{commit_node.rs, dir_node.rs, file_chunk_node.rs, file_node.rs, vnode.rs, merkle_tree_node.rs}
Changed various deserialize/deserialize_id signatures to return rmp_serde::decode::Error instead of OxenError; removed empty impl TMerkleTreeNode blocks from node types.
DB layer updates
crates/lib/src/core/db/merkle_node/merkle_node_db.rs
Converted decoding to use from_u8_unwrap in some paths; changed to_node to return rmp decode error; replaced direct serialize(...).unwrap() with to_msgpack_bytes()?; generalized node-related APIs to N: TMerkleTreeNode and added where OxenError: From<N::SerializationError> bounds.
Repo/tree writer generics
crates/lib/src/repositories/tree.rs
Refactored private p_write_tree to explicit generics <N, S> and added where constraints linking N::SerializationError to OxenError for conversion; adjusted recursive calls to use same N.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • rpschoenburg
  • gschoeni

Poem

🐰 I nibble bytes and tidy seams,
Replacing panics with gentle streams.
Nodes now chat in careful code,
MsgPack whispers down the road.
Hop on—serialization beams! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: making the TMerkleTreeNode trait dyn-compatible by removing the Serialize constraint.
Description check ✅ Passed The description is directly related to the changeset, explaining the rationale, implementation details, and impact of making TMerkleTreeNode dyn-compatible.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mg/1_TMerkleTreeNode_object_safety

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
crates/lib/src/error.rs (1)

315-317: Consider adding RmpEncodeError to the internal error hints.

The RmpDecodeError is included in the hint() method (Line 381) as an internal error, but the new RmpEncodeError is not. For consistency, encoding errors should likely receive the same hint.

🔧 Suggested fix
             DB(_) | ArrowError(_) | BinCodeError(_) | RedisError(_) | R2D2Error(_)
-            | RmpDecodeError(_) => {
+            | RmpDecodeError(_) | RmpEncodeError(_) => {
                 "This is an internal error. Run with RUST_LOG=debug for more details."
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/lib/src/error.rs` around lines 315 - 317, Add RmpEncodeError to the
internal error hints returned by the hint() method so encoding errors get the
same diagnostic guidance as decoding errors; locate the hint() implementation
and where it currently matches RmpDecodeError and include RmpEncodeError in that
same arm (or add a separate arm that returns the same internal error hint) so
both rmp_serde::encode::Error and rmp_serde::decode::Error map to the internal
error hint.
crates/lib/src/repositories/tree.rs (1)

1126-1128: Consider returning an error instead of panicking.

The panic! on unexpected node types could be replaced with an Err return for more graceful error handling, though this represents a programming error rather than a runtime condition.

🔧 Optional fix
             EMerkleTreeNode::File(file_node) => {
                 db.add_child(file_node)?;
             }
-            node => {
-                panic!("p_write_tree Unexpected node type: {node:?}");
-            }
+            EMerkleTreeNode::Commit(_) | EMerkleTreeNode::FileChunk(_) => {
+                return Err(OxenError::basic_str(format!(
+                    "p_write_tree unexpected node type: {:?}",
+                    child.node.node_type()
+                )));
+            }
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/lib/src/repositories/tree.rs` around lines 1126 - 1128, Replace the
panic in the catch-all match arm inside p_write_tree with a propagated error
return: instead of panic!("p_write_tree Unexpected node type: {node:?}"), return
an Err variant (using the crate's repository error type or anyhow::Error) that
includes a descriptive message and the debug of node so callers can handle it;
update the function signature to return Result if needed and adjust call sites
to propagate the error via ? or map_err so this programming-error case is
reported without aborting the process.
crates/lib/src/core/db/merkle_node/merkle_node_db.rs (1)

257-264: ?Sized bounds needed only if trait objects are intended for these entry points.

The current code's implicit Sized bounds on N would reject &dyn TMerkleTreeNode<...> at these call sites. However, all current callers in p_write_tree pass concrete types (vnode, dir_node, file_node), so this is not a current issue.

If the design goal is to allow trait-object dispatch through these helpers, add ?Sized:

Suggested changes (if needed)
-    pub fn open_read_write_if_not_exists<N: TMerkleTreeNode>(
+    pub fn open_read_write_if_not_exists<N: TMerkleTreeNode + ?Sized>(
-    pub fn open_read_write<N: TMerkleTreeNode>(
+    pub fn open_read_write<N: TMerkleTreeNode + ?Sized>(
-    fn write_node<N: TMerkleTreeNode>(
+    fn write_node<N: TMerkleTreeNode + ?Sized>(
-    pub fn add_child<N: TMerkleTreeNode>(&mut self, item: &N) -> Result<(), OxenError>
+    pub fn add_child<N: TMerkleTreeNode + ?Sized>(&mut self, item: &N) -> Result<(), OxenError>

Applies to: lines 257–264, 277–284, 373–380, 424–427.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/lib/src/core/db/merkle_node/merkle_node_db.rs` around lines 257 - 264,
Update the generic bounds to allow trait-object dispatch by adding ?Sized to the
N type parameter where needed: change the signatures of
open_read_write_if_not_exists, (and the other helper functions flagged in the
review at the same pattern) so the generic bound becomes N: TMerkleTreeNode +
?Sized and keep the parameter as node: &N; leave the existing where clauses
(e.g., OxenError: From<N::SerializationError>) intact so SerializationError
still resolves for unsized types.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/lib/src/core/db/merkle_node/merkle_node_db.rs`:
- Around line 331-334: The code currently uses
MerkleTreeNodeType::from_u8_unwrap on bytes read from disk (e.g., in the lookup
mapping shown) which will panic on invalid bytes; change these sites (the
occurrence in the lookup mapping and the similar uses at the other noted
locations) to call the fallible MerkleTreeNodeType::from_u8 and propagate the
error instead of unwrapping so open()/map() return Err for invalid node-type
bytes; update the surrounding logic in the functions handling disk reads (e.g.,
open(), map()) to propagate the Result from from_u8 rather than assuming a valid
value.

---

Nitpick comments:
In `@crates/lib/src/core/db/merkle_node/merkle_node_db.rs`:
- Around line 257-264: Update the generic bounds to allow trait-object dispatch
by adding ?Sized to the N type parameter where needed: change the signatures of
open_read_write_if_not_exists, (and the other helper functions flagged in the
review at the same pattern) so the generic bound becomes N: TMerkleTreeNode +
?Sized and keep the parameter as node: &N; leave the existing where clauses
(e.g., OxenError: From<N::SerializationError>) intact so SerializationError
still resolves for unsized types.

In `@crates/lib/src/error.rs`:
- Around line 315-317: Add RmpEncodeError to the internal error hints returned
by the hint() method so encoding errors get the same diagnostic guidance as
decoding errors; locate the hint() implementation and where it currently matches
RmpDecodeError and include RmpEncodeError in that same arm (or add a separate
arm that returns the same internal error hint) so both rmp_serde::encode::Error
and rmp_serde::decode::Error map to the internal error hint.

In `@crates/lib/src/repositories/tree.rs`:
- Around line 1126-1128: Replace the panic in the catch-all match arm inside
p_write_tree with a propagated error return: instead of panic!("p_write_tree
Unexpected node type: {node:?}"), return an Err variant (using the crate's
repository error type or anyhow::Error) that includes a descriptive message and
the debug of node so callers can handle it; update the function signature to
return Result if needed and adjust call sites to propagate the error via ? or
map_err so this programming-error case is reported without aborting the process.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 823320ab-2b3e-4dfe-81e9-95cc6d71f313

📥 Commits

Reviewing files that changed from the base of the PR and between 60389ea and ce6c05f.

📒 Files selected for processing (10)
  • crates/lib/src/core/db/merkle_node/merkle_node_db.rs
  • crates/lib/src/error.rs
  • crates/lib/src/model/merkle_tree/node/commit_node.rs
  • crates/lib/src/model/merkle_tree/node/dir_node.rs
  • crates/lib/src/model/merkle_tree/node/file_chunk_node.rs
  • crates/lib/src/model/merkle_tree/node/file_node.rs
  • crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs
  • crates/lib/src/model/merkle_tree/node/vnode.rs
  • crates/lib/src/model/merkle_tree/node_type.rs
  • crates/lib/src/repositories/tree.rs

Comment on lines 331 to 334
let dtype = lookup
.as_ref()
.map(|l| MerkleTreeNodeType::from_u8(l.data_type))
.map(|l| MerkleTreeNodeType::from_u8_unwrap(l.data_type))
.unwrap_or(MerkleTreeNodeType::Commit);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Return invalid node-type bytes as Err, not panics.

These bytes come straight from the node files, so from_u8_unwrap() means one corrupt record—or a repo written with a newer node variant—can abort open()/map(). This PR already introduced a fallible from_u8() for exactly this case, and these methods already return Result, so the read path should propagate the error instead of crashing.

💡 Suggested fix
-        let dtype = lookup
-            .as_ref()
-            .map(|l| MerkleTreeNodeType::from_u8_unwrap(l.data_type))
-            .unwrap_or(MerkleTreeNodeType::Commit);
+        let dtype = lookup
+            .as_ref()
+            .map(|l| MerkleTreeNodeType::from_u8(l.data_type))
+            .transpose()?
+            .unwrap_or(MerkleTreeNodeType::Commit);
-        let data_type = MerkleTreeNodeType::from_u8_unwrap(lookup.data_type);
+        let data_type = MerkleTreeNodeType::from_u8(lookup.data_type)?;
-            let dtype = MerkleTreeNodeType::from_u8_unwrap(*dtype);
+            let dtype = MerkleTreeNodeType::from_u8(*dtype)?;

Also applies to: 510-510, 529-529

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/lib/src/core/db/merkle_node/merkle_node_db.rs` around lines 331 - 334,
The code currently uses MerkleTreeNodeType::from_u8_unwrap on bytes read from
disk (e.g., in the lookup mapping shown) which will panic on invalid bytes;
change these sites (the occurrence in the lookup mapping and the similar uses at
the other noted locations) to call the fallible MerkleTreeNodeType::from_u8 and
propagate the error instead of unwrapping so open()/map() return Err for invalid
node-type bytes; update the surrounding logic in the functions handling disk
reads (e.g., open(), map()) to propagate the Result from from_u8 rather than
assuming a valid value.

Copy link
Copy Markdown
Contributor

@CleanCut CleanCut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can reduce the complexity here quite a bit and still accomplish the goal!

Comment on lines +1095 to +1109
// we make sure we can convert the node's serialization error into an OxenError
OxenError: From<N::SerializationError>,
// we make sure that the node type implements TMerkleTreeNode
N: TMerkleTreeNode<SerializationError = S>,
// and we make sure that the nodes we're serializing all have the same serialization error type
VNode: TMerkleTreeNode<SerializationError = S>,
DirNode: TMerkleTreeNode<SerializationError = S>,
FileNode: TMerkleTreeNode<SerializationError = S>,
// NOTE:
// We could have dropped everything here, have no `S` generic, and have this in the where:
// N: TMerkleTreeNode<SerializationError = rmp_serde::encode::Error>,
// But, by doing it this way, we make it so that we can change the actual SerializationError
// type without needing to change this function's type. It works so long as the error type
// for all of the node's trait implementations align.
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bounds complexity here should mostly (completely?) go away with the elimination of SerializationError.

We wouldn't need the `S` generic anyway, but that's irrelevant now.

My original comment before I saw that we could eliminate SerializationError:

Go ahead and drop the S generic and simplify this.

The extra bounds complexity is unnecessary. All nodes share the same SerializationError type already, and if we change it, we'll change it everywhere, including here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an intentional part of the design. I don't want to overly rely on using OxenError, because that is a large enum where nearly all of the variants are not applicable to the code. There's only a single error -- serialization problem.

An explicit design I want to do more of is keep code as locally scoped as possible. Here, that meant only using a single serialization error type.

There's one way to simplify this, and that's to hardcode the serialization error type as a serde error. However, given the way I've documented this and explained it, and used names like "SerializationError", I feel that this is clear. The benefit is it gives the code the ability to switch to a new SerializationError without requiring this code to change. Which is good, since since this code is agnostic to the serialization details! :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this was a deliberate design choice. My feedback is that it is not a good direction to go in.

I don't want to overly rely on using OxenError, because that is a large enum where nearly all of the variants are not applicable to the code.

OxenError exists so we have a consistent error type to use throughout the liboxen codebase. In other words, it is intended that we use OxenError throughout liboxen. In this context, there is no overly relying on it. I'd encourage us to lean into using it rather than working around it. There is nothing wrong with the error enum having a large number of variants. Fragmenting our error approach at this point is the wrong direction to go in.

This code will still be agnostic to serialization details using OxenError.

Hardcoding an error type specific to a single implementation into a trait is not a simplification.

Comment on lines +92 to +97
/// Deserialize a `u8` value into a `MerkleTreeNodeType`.
/// Panics if the `u8` value is not a valid `MerkleTreeNodeType`.
pub fn from_u8_unwrap(val: u8) -> MerkleTreeNodeType {
Self::from_u8(val).expect("Invalid MerkleTreeNodeType: {val}")
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use from_u8 instead. Returning an error is the correct thing to do in every place that this panicky function is used.

Suggested change
/// Deserialize a `u8` value into a `MerkleTreeNodeType`.
/// Panics if the `u8` value is not a valid `MerkleTreeNodeType`.
pub fn from_u8_unwrap(val: u8) -> MerkleTreeNodeType {
Self::from_u8(val).expect("Invalid MerkleTreeNodeType: {val}")
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I would like to make this change, that's why I'm introducing it here :) I thought I got all of these -- let me double check the call sites.

Comment on lines +48 to +51
#[derive(Debug, Error)]
#[error("Deserialization failure: Invalid MerkleTreeNodeType: {0}")]
/// Failed to deserialize a `MerkleTreeNodeType` due to unknown `u8` value.
pub struct InvalidMerkleTreeNodeType(u8);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a new OxenError variant.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a variant that wraps this. Keeping it separate is intentional -- it makes it clearer that code is only returning this as an error versus every single possible variant of an OxenError. It's used in the to/from u8 for the merkle tree node type. That code should only ever return this kind of error. I want to make that constraint clear in the types, hence why it's a standalone struct error here and there's a #[from] wrapper in OxenError for it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I understand your choices are intentional. My feedback is that there is a better approach.

it makes it clearer that code is only returning this as an error versus every single possible variant of an OxenError...

My comment from before about leaning into OxenError and many variants being okay applies here.

That code should only ever return this kind of error. I want to make that constraint clear in the types...

I don't know a good reason to assert that unknown implementations should only return one type of error.

Even if all implementations did return one type of error, or even this specific error, we're already following the industry best practice of using a single enum error type for our liboxen library. I don't see a reason to fragment our error handling approach.

Comment on lines +137 to +141
/// The error type that can occur during serialization.
type SerializationError: std::error::Error + Send + Sync + 'static;

/// Serialize this node to a MsgPack byte array.
fn to_msgpack_bytes(&self) -> Result<Vec<u8>, Self::SerializationError>;
Copy link
Copy Markdown
Contributor

@CleanCut CleanCut Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need an associated type here. Use OxenError. That should help eliminate a lot of the extra complexity!

Suggested change
/// The error type that can occur during serialization.
type SerializationError: std::error::Error + Send + Sync + 'static;
/// Serialize this node to a MsgPack byte array.
fn to_msgpack_bytes(&self) -> Result<Vec<u8>, Self::SerializationError>;
/// Serialize this node to a MsgPack byte array.
fn to_msgpack_bytes(&self) -> Result<Vec<u8>, OxenError>;

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly not using OxenError. (I explained this in an earlier comment above, but I'll reiterate a bit here) That is too wide of a type -- it includes too many variants that should never be used here. Using it as the error type is confusing and allows the code to return invalid errors. It's only ever valid for it to return an error related to serialization.

This design is actually simpler if we stop to think about what all of this syntax is saying -- it's saying "this function only returns Vec<u8> or some single kind of error."

One issue with the oxen code today is it's very hard to understand what exact kind of error some piece of code returns if it has a Result<_, OxenError>. With this design, it's straightforward to see what specific error type the code returns by only looking at the signature (instead of reading through the entire definition and tracking down all of the possible variants of OxenError that are returned).

In turn, this design makes all of the concrete implementation code simpler: they all use rmp_serde::decode::Error instead of having OxenError. So we can look at those signatures and see "ok, the only thing that can go wrong is a serialization error" instead of "it can be any variant of an OxenError, which one(s) is it actually going to be?"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a large number of variants in the error enum isn't a problem. It's pretty normal. It isn't clear to me why you find this confusing. On the other hand, having to search for scattered errors around the codebase and figure out when and where to use them would quickly become confusing.

The argument for dividing our single error enum into two error enums would be that there are two disjoint sets of errors that are used in mutually exclusive call stacks. We don't even have that situation.

One issue with the oxen code today is it's very hard to understand what exact kind of error some piece of code returns if it has a Result<_, OxenError>.

I don't think that is an issue. My observation has been that almost nothing that handles an error cares which specific error it was. In the few places where we do care, we match on the OxenError variant that we care about. Where's the issue? It's already straightforward to see the variant type. Isn't that why we've been replacing the BasicString variants with more specific variants, at your request--just in case someone cares in the future and wants to match on the variant?

In turn, this design makes all of the concrete implementation code simpler: they all use rmp_serde::decode::Error instead of having OxenError.

Your design makes the concrete implementation more complex. All the bounds stuff goes away when you switch to OxenError.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/lib/src/model/merkle_tree/node_type.rs (1)

157-182: The new test still doesn't pin the two regressions this refactor cares about.

Right now it only proves concrete impls exist and their errors convert into OxenError. It won't fail if TMerkleTreeNode stops being dyn-compatible again, and it doesn't lock the repository-critical u8 mapping / invalid-byte path described above.

Proposed test additions
     #[test]
     fn test_nodes_implement_trait() {
         /// this only exists so we can check that all node types implement `TMerkleTreeNode`
         /// it will be a compile-time error if a node type does not implement the trait
-        fn is_tmerkletreenode<T: TMerkleTreeNode>(_: T)
+        fn is_tmerkletreenode<T: TMerkleTreeNode>(node: &T)
         where
             OxenError: From<T::SerializationError>,
         {
+            let _: &dyn TMerkleTreeNode<SerializationError = T::SerializationError> = node;
         }
 
-        is_tmerkletreenode(CommitNode::default());
-        is_tmerkletreenode(DirNode::default());
-        is_tmerkletreenode(VNode::default());
-        is_tmerkletreenode(FileNode::default());
-        is_tmerkletreenode(FileChunkNode::default());
+        is_tmerkletreenode(&CommitNode::default());
+        is_tmerkletreenode(&DirNode::default());
+        is_tmerkletreenode(&VNode::default());
+        is_tmerkletreenode(&FileNode::default());
+        is_tmerkletreenode(&FileChunkNode::default());
     }
+
+    #[test]
+    fn test_node_type_u8_mapping_is_stable() {
+        for (node_type, value) in [
+            (MerkleTreeNodeType::Commit, 0u8),
+            (MerkleTreeNodeType::Dir, 1u8),
+            (MerkleTreeNodeType::VNode, 2u8),
+            (MerkleTreeNodeType::File, 3u8),
+            (MerkleTreeNodeType::FileChunk, 4u8),
+        ] {
+            assert_eq!(node_type.to_u8(), value);
+            assert_eq!(
+                MerkleTreeNodeType::from_u8(value).expect("valid node type"),
+                node_type
+            );
+        }
+    }
+
+    #[test]
+    fn test_invalid_node_type_returns_error() {
+        assert!(matches!(
+            MerkleTreeNodeType::from_u8(u8::MAX),
+            Err(InvalidMerkleTreeNodeType(_))
+        ));
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/lib/src/model/merkle_tree/node_type.rs` around lines 157 - 182, Add
two stronger tests: (1) verify TMerkleTreeNode is object-safe by defining a
helper that accepts &dyn TMerkleTreeNode (e.g., fn takes_dyn(_: &dyn
TMerkleTreeNode) {}) and passing references like &CommitNode::default(),
&DirNode::default(), &VNode::default(), &FileNode::default(),
&FileChunkNode::default() to it; (2) pin the u8 mapping by exercising the
NodeType (or equivalent) conversion from bytes (e.g., TryFrom<u8> / from_u8) and
asserting each concrete node variant maps to the expected byte value and that
out-of-range/invalid bytes produce an Err or InvalidByte result. Reference
TMerkleTreeNode, the existing is_tmerkletreenode test, and the NodeType/from-u8
conversion functions to locate where to add these checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/lib/src/model/merkle_tree/node_type.rs`:
- Around line 136-155: TMerkleTreeNode now adds an associated type
SerializationError and a method to_msgpack_bytes which breaks any manual impls;
update the crate by (1) adding clear migration docs and CHANGELOG entry
describing how to change any manual impls to declare type SerializationError =
rmp_serde::encode::Error and implement to_msgpack_bytes (or rely on the provided
blanket impl for types implementing Serialize + MerkleTreeNodeIdType + Debug +
Display), (2) mark this change as a major-version bump / deprecation period for
users with custom impls, and (3) optionally provide a short compatibility
note/example showing the exact replacement impl for TMerkleTreeNode (using type
SerializationError and the to_msgpack_bytes body from the blanket impl) so
external consumers can update easily.

---

Nitpick comments:
In `@crates/lib/src/model/merkle_tree/node_type.rs`:
- Around line 157-182: Add two stronger tests: (1) verify TMerkleTreeNode is
object-safe by defining a helper that accepts &dyn TMerkleTreeNode (e.g., fn
takes_dyn(_: &dyn TMerkleTreeNode) {}) and passing references like
&CommitNode::default(), &DirNode::default(), &VNode::default(),
&FileNode::default(), &FileChunkNode::default() to it; (2) pin the u8 mapping by
exercising the NodeType (or equivalent) conversion from bytes (e.g., TryFrom<u8>
/ from_u8) and asserting each concrete node variant maps to the expected byte
value and that out-of-range/invalid bytes produce an Err or InvalidByte result.
Reference TMerkleTreeNode, the existing is_tmerkletreenode test, and the
NodeType/from-u8 conversion functions to locate where to add these checks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 80e1e045-7d88-49f3-91f8-12027386cb23

📥 Commits

Reviewing files that changed from the base of the PR and between ce6c05f and 72fc293.

📒 Files selected for processing (1)
  • crates/lib/src/model/merkle_tree/node_type.rs

@malcolmgreaves malcolmgreaves force-pushed the mg/1_TMerkleTreeNode_object_safety branch from 72fc293 to 4b3d9a4 Compare April 1, 2026 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants